import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import nltk
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
import json
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
/opt/conda/lib/python3.9/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.10.3-CAPI-1.16.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.
pdaps_path = "data/pdaps.csv"
df = pd.read_csv(pdaps_path)
The dataset is a survey of state implementation of nalaxone access and immunity laws over time generated by the Prescription Drug Abuse Policy System (PDAPS), a data and research hub maintained by the Center for Public Health Law Research at the Temple University Beasley School of Lawby with funding from the National Institute on Drug Abuse. This data set in particular also received support from the Opioid Research Workgroup from the University of Florida College of Pharmacy. It examines policies from January 1, 2001 to January 1, 2022.
This data was generated with the intent to map the previous and current state of naloxone prescription and dispensing regulation policies, as well as to inform legal researchers and future policy. This data is directly applicable to our research, as it contains much of the direct information about opioid overdose prevention policies that we plan to assess through this project.
The policies included in the survey include those such as "Does the Jurisdiction have a nalaxone access law?" and "Do prescribers have immunity from civil liability for prescribing, dispensing or distributing naloxone to a layperson?" Each question is represented as a row in the dataset, and the responses of "Yes/No/Not Applicable (or no Data) are coded as "1, 0, 2" in the dataset, respectively.
The raw dataset has 5 rows and 36 columns, with each row representing the policy environment in a given state during the time range specified by the Effective Date and Valid Through Date variables. Each change in time period represents an ammendment to the state's Nalaxone access and immunity laws -- For example, from 2001/01/01 to 2015/06/07, Alabama did not have a policy permitting access to Nalaxone, but the next row reflects how the law was ammended on 2015/06/08 to permit access.
A strength of the dataset is that it is quite clean (apart from the variable names that can be quickly recoded), since all the categorical variables are nicely one hot encoded. A weakness is the high proportion of null values from 2001 to around 2015--this is not intrinsic to the dataset, but rather a consequence of the fact that most states did not begin adopting naloxone immunity provisions until the early 2010s.
This data was collected ethically - state policies are public record, as are legal policies regulating medical professionals. No surveying, testing, or otherwise assessments with capacity for unethical conduct were required for the collection of this data.
df.head()
| Jurisdictions | Effective Date | Valid Through Date | naaddressoaayn | nahealthcrimproyn | nanapimm1yn | narcimm1yn | nahealthcivliayn | nanapimm2yn | narcimm2yn | ... | pharmacist-dispensing-method_Directly authorized by legislature | naimmcrimprolpyn | nanapimm3yn | narcimm3yn | naimmcivlialpyn | nanapimm4yn | narcimm4yn | naloxone-crimpossesion | naloxone-crimpossessionprog | crim-possess-reasonable | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Alabama | 2001-01-01 | 2015-06-07 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | ... | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 1 | Alabama | 2015-06-08 | 2016-05-09 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 2 | 2 |
| 2 | Alabama | 2016-05-10 | 2022-01-01 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 2 | 2 |
| 3 | Alaska | 2001-01-01 | 2016-03-14 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | ... | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 4 | Alaska | 2016-03-15 | 2017-03-21 | 1 | 0 | 2 | 2 | 1 | 0 | 0 | ... | 0 | 0 | 2 | 2 | 1 | 0 | 0 | 0 | 2 | 2 |
5 rows × 36 columns
The 33 policy variables, denoted by the columns in the table, correlate to 12 primary questions and their related sub-questions. Currently, we believe that all of these variables could be pertinent within our analysis.
The first question asks if the jurisdiction has a nalaxone access law. The next type of primary immunity questions are structured as such:
- Does [agent] have immunity from [liability] for prescribing, dispensing, or distributing naloxone to a layperson?
The [agent]s encompassed by the survey include:
While the [liability]s include:
In the data cleaning below, these 8 questions are coded as '[agent]-[liability]-immun'
The other 3 primary questions include:
- Are prescriptions of Naloxone authorized to third-parties?
- Are pharmacists allowed to dispense or distribute naloxone without a patient-specific prescription from another medical professional?
- Does the law remove criminal liability for possession of naloxone without a prescription?
For primary questions 1, 2, and 4, there are subquestions that ask if the [agent] is required to
For primary question 3, if the answer is Yes, the subquestions ask about the method(s) of distribution allowed by the law:
The first step of the cleaning process was to recode the variable names to more clearly reflect the survey questions as described above.
Then, we reshaped the data such that each row represents a state and the policies it had in place during a given year, so that it would match the granularity of the dependent variable (deaths per state in a given year, see eda-kff file). This resulted in a dataframe with 1071 rows (51 jurisdictions * 21 years of analysis between 2001 and 2021).
The NaN values were preserved so that we could first analyze the percentage of null values before imputing them.
#rename columns
new_cols = {
'naaddressoaayn': 'nal_access_law',
'nahealthcrimproyn': 'presc_crim_immun',
'nanapimm1yn': 'presc_crim_immun_nal_program',
'narcimm1yn': 'presc_crim_immun_reason_care',
'nahealthcivliayn':'presc_civil_immun',
'nanapimm2yn': 'presc_civil_immun_nal_program',
'narcimm2yn': 'presc_civil_immun_reason_care',
'naloxone-presprof': 'presc_profsanctions_immun',
'naloxone-dispcrim': 'disp_crim_immun',
'naloxone-dcrimpro': 'disp_crim_immun_nal_program',
'naloxone-dcrimcare': 'disp_crim_immun_reason_care',
'naloxone-dispciv': 'disp_civil_immun',
'naloxone-dcivprog':'disp_civil_immun_nal_program',
'naloxone-dcivcare': 'disp_civil_immun_reason_care',
'naloxone-presprof-disp': 'disp_profsanctions_immun',
'naloxone-thirdparty': 'third_party_presc',
'naloxone-thirdprog': 'third_party_presc_nal_program',
'naloxone-thirdcare': 'third_party_reason_care',
'pharmacist-dispensing': 'pharm_dispense_no_prescription',
'pharmacist-dispensing-method_Standing_order': 'pharm_dispense-standing_order',
'pharmacist-dispensing-method_Protocol order': 'pharm_dispense-protocol_order',
'pharmacist-dispensing-method_Naloxone-specific collaborative practice agreement':'pharm_dispense-collab_agree',
'pharmacist-dispensing-method_Pharmacist prescriptive authority':'pharm_dispense-presc_auth',
'pharmacist-dispensing-method_Directly authorized by legislature':'pharm_dispense-legislature',
'naimmcrimprolpyn': 'lay_crim_immun',
'nanapimm3yn':'lay_crim_immun_nal_program',
'narcimm3yn':'lay_crim_immun_reason_care',
'naimmcivlialpyn': 'lay_civil_immun',
'nanapimm4yn': 'lay_civil_immun_nal_progam',
'narcimm4yn': 'lay_civil_immum_reason_care',
'naloxone-crimpossesion': 'possess_crim_immun',
'naloxone-crimpossessionprog': 'possess_crim_immun_nal_program',
'crim-possess-reasonable': 'possess_crim_immun_reason_care'
}
df.rename(index=int, columns=new_cols, inplace=True)
from pandas.core.groupby.groupby import MultiIndex
from collections import defaultdict
split_dict = defaultdict(list)
#separate ranges of policy effective dates into individual years
for ind in df.index:
start_year, end_year = int(df['Effective Date'][ind].split('-')[0]), int(df['Valid Through Date'][ind].split('-')[0])
state = df['Jurisdictions'][ind]
for year in range(start_year, end_year+1):
policies = df.iloc[ind, 3:].values
split_dict[(year,state)]= policies
#rename columns and create new df from dict
split_cols = df.columns[3:]
df = pd.DataFrame.from_dict(split_dict, orient='index', columns = split_cols)
df.index = MultiIndex.from_tuples(df.index)
df = df.reset_index()
df.rename(columns={'level_0':'Year','level_1':'State'}, inplace=True)
df = df[df['Year'] < 2022]
print('data shape:', df.shape)
df.head()
df.replace(2, np.nan, inplace=True)
data shape: (1071, 35)
The histogram below shows a distribution of policy lengths. Apart from the high count of 0 values (indicating either null or no policy), we can observe a second peak at around 6-7 years. Again, this is consistent with our observation that states did not widely adopt immunity policies until around 2015. The outlier at 20 years is likely New Mexico, which was the first state to adopt a naloxone access law.
avg_len = df.groupby('State').agg('sum').iloc[:,1:]
avg_len.stack().plot.hist(title='Distribution of Policy Lengths (Years)')
<AxesSubplot:title={'center':'Distribution of Policy Lengths (Years)'}, ylabel='Frequency'>
As mentioned above, most states did not adopt naloxone immunity-related policies until the early 2010s--hence, around 61% of the data is currently null values.
#find percentage of null values in entire dataset
nulls = df.isna().sum().sum()
nulls/df.size
0.6115512871815393
Per the histogram below, most of the policy variables contain around 60-70% null values; with the nal-access-law variable being the only one with complete data.
null_df = df.groupby('Year').agg(lambda x: x.isna().sum()).iloc[:,1:]
annual_nulls = null_df.sum(axis = 1)
annual_totals = 51*33
policy_nulls = null_df.sum()
policy_totals = 51*21
(policy_nulls / policy_totals).sort_values(ascending=False).plot.hist(figsize = (10,5))
<AxesSubplot:ylabel='Frequency'>
policy_nulls.sort_values(ascending=True)
nal_access_law 0 disp_profsanctions_immun 637 disp_civil_immun 637 third_party_presc 637 presc_profsanctions_immun 637 disp_crim_immun 637 presc_civil_immun 637 lay_civil_immun 637 possess_crim_immun 637 presc_crim_immun 637 lay_crim_immun 637 pharm_dispense_no_prescription 637 pharm_dispense-legislature 713 pharm_dispense-presc_auth 713 pharm_dispense-collab_agree 713 pharm_dispense-protocol_order 713 pharmacist-dispensing-method_Standing order 713 third_party_presc_nal_program 714 third_party_reason_care 714 lay_civil_immun_nal_progam 719 lay_civil_immum_reason_care 719 disp_civil_immun_reason_care 734 disp_civil_immun_nal_program 734 presc_civil_immun_reason_care 740 presc_civil_immun_nal_program 740 lay_crim_immun_nal_program 763 lay_crim_immun_reason_care 763 presc_crim_immun_reason_care 770 presc_crim_immun_nal_program 770 disp_crim_immun_reason_care 775 disp_crim_immun_nal_program 775 possess_crim_immun_nal_program 961 possess_crim_immun_reason_care 961 dtype: int64
Between 2012 and 2015, the density of null values per year declines sharply--from 2015 onward, less than 30% of the data is null in a given year.
(annual_nulls / annual_totals).plot.line(ylabel="Proportion of Nulls", title='Null Proportion per Year', grid=True)
plt.xticks(range(2000,2022,4))
plt.show()
Ultimately, we decided to simply impute null values with 0 -- given that most states did not begin imputing naloxone immunity laws until 2015, it is reasonable to equate a lack of data with a lack of a policy.
df.fillna(0)
| Year | State | nal_access_law | presc_crim_immun | presc_crim_immun_nal_program | presc_crim_immun_reason_care | presc_civil_immun | presc_civil_immun_nal_program | presc_civil_immun_reason_care | presc_profsanctions_immun | ... | pharm_dispense-legislature | lay_crim_immun | lay_crim_immun_nal_program | lay_crim_immun_reason_care | lay_civil_immun | lay_civil_immun_nal_progam | lay_civil_immum_reason_care | possess_crim_immun | possess_crim_immun_nal_program | possess_crim_immun_reason_care | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2001 | Alabama | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | 2002 | Alabama | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 2003 | Alabama | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 2004 | Alabama | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 2005 | Alabama | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1116 | 2017 | Wyoming | 1 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1117 | 2018 | Wyoming | 1 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1118 | 2019 | Wyoming | 1 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1119 | 2020 | Wyoming | 1 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1120 | 2021 | Wyoming | 1 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1071 rows × 35 columns
In the following visualizations, we selected the 12 Primary Policy questions (as defined in the previous section) for analysis, and denoted them within the immun-pols variable
immun_pols = ['nal_access_law','presc_crim_immun','presc_civil_immun',
'presc_profsanctions_immun','disp_crim_immun','disp_civil_immun',
'disp_profsanctions_immun', 'third_party_presc','pharm_dispense_no_prescription',
'lay_crim_immun','lay_civil_immun', 'possess_crim_immun']
df21 = df[(df['Year']==2021)].loc[:,immun_pols]
pols_2021 = pd.DataFrame()
pols_2021['Yes'] = df21.sum()
pols_2021['No'] = df21.apply(lambda x: x==0).sum()
pols_2021.sort_values(by='Yes', ascending=False).plot.bar(stacked=True, figsize = (10,5), color = {'Yes':'#479cd1', 'No':'#ffe330'})
plt.xlabel('Policy', fontsize=12)
plt.ylabel('Number of States', fontsize=12)
plt.title('Adoption of Primary Policies in 2021', fontsize=16)
plt.show()
law1_count = df.groupby('Year').agg('sum', numeric_only=True).loc[:,immun_pols]
law1_count.plot.line(figsize = (10, 5), grid = True, xlabel = "Year", ylabel = "Num States", title='Adoption of Policies Over Time')
plt.xticks(range(2000,2022,4))
plt.show()
us_state_to_abbrev = {
"Alabama": "AL",
"Alaska": "AK",
"Arizona": "AZ",
"Arkansas": "AR",
"California": "CA",
"Colorado": "CO",
"Connecticut": "CT",
"Delaware": "DE",
"Florida": "FL",
"Georgia": "GA",
"Hawaii": "HI",
"Idaho": "ID",
"Illinois": "IL",
"Indiana": "IN",
"Iowa": "IA",
"Kansas": "KS",
"Kentucky": "KY",
"Louisiana": "LA",
"Maine": "ME",
"Maryland": "MD",
"Massachusetts": "MA",
"Michigan": "MI",
"Minnesota": "MN",
"Mississippi": "MS",
"Missouri": "MO",
"Montana": "MT",
"Nebraska": "NE",
"Nevada": "NV",
"New Hampshire": "NH",
"New Jersey": "NJ",
"New Mexico": "NM",
"New York": "NY",
"North Carolina": "NC",
"North Dakota": "ND",
"Ohio": "OH",
"Oklahoma": "OK",
"Oregon": "OR",
"Pennsylvania": "PA",
"Rhode Island": "RI",
"South Carolina": "SC",
"South Dakota": "SD",
"Tennessee": "TN",
"Texas": "TX",
"Utah": "UT",
"Vermont": "VT",
"Virginia": "VA",
"Washington": "WA",
"West Virginia": "WV",
"Wisconsin": "WI",
"Wyoming": "WY",
"District of Columbia": "DC",
}
cp_df = df.loc[:, ['State', 'Year']]
totals = df.loc[:, immun_pols].sum(axis=1)
cp_df['Policies'] = totals
cp_df.replace({'State':us_state_to_abbrev}, inplace=True)
cp_df
| State | Year | Policies | |
|---|---|---|---|
| 0 | AL | 2001 | 0.0 |
| 1 | AL | 2002 | 0.0 |
| 2 | AL | 2003 | 0.0 |
| 3 | AL | 2004 | 0.0 |
| 4 | AL | 2005 | 0.0 |
| ... | ... | ... | ... |
| 1116 | WY | 2017 | 11.0 |
| 1117 | WY | 2018 | 11.0 |
| 1118 | WY | 2019 | 11.0 |
| 1119 | WY | 2020 | 11.0 |
| 1120 | WY | 2021 | 11.0 |
1071 rows × 3 columns
import plotly.express as px
fig = px.choropleth(cp_df,
locations = 'State',
color= 'Policies',
animation_frame="Year",
color_continuous_scale="Sunset",
locationmode='USA-states',
scope="usa",
range_color=(0, 12),
title='Number of Primary Policies Implemented by Year',
height=600
)
fig.show(renderer="notebook")
corr_matrix = df[immun_pols].corr()
sns.heatmap(corr_matrix, annot=True)
plt.show()